• Masked Sentence Model based on BERT for Move Recognition in Medical Scientific Abstracts

    分类: 计算机科学 >> 自然语言理解与机器翻译 分类: 图书馆学、情报学 >> 情报过程自动化的方法和设备 提交时间: 2019-10-29

    摘要: Purpose: Move recognition in scientific abstracts is an NLP task of classifying sentences of the abstracts into different types of language unit. To improve the performance of move recognition in scientific abstracts, a novel model of move recognition is proposed that outperforms BERT-Base method. Design: Prevalent models based on BERT for sentence classification often classify sentences without considering the context of the sentences. In this paper, inspired by the BERT's Masked Language Model (MLM), we propose a novel model called Masked Sentence Model that integrates the content and contextual information of the sentences in move recognition. Experiments are conducted on the benchmark dataset PubMed 20K RCT in three steps. And then compare our model with HSLN-RNN, BERT-Base and SciBERT using the same dataset. Findings: Compared with BERT-Base and SciBERT model, the F1 score of our model outperforms them by 4.96% and 4.34% respectively, which shows the feasibility and effectiveness of the novel model and the result of our model comes closest to the state-of-the-art results of HSLN-RNN at present. Research Limitations: The sequential features of move labels are not considered, which might be one of the reasons why HSLN-RNN has better performance. And our model is restricted to dealing with bio-medical English literature because we use dataset from PubMed which is a typical bio-medical database to fine-tune our model. Practical implications: The proposed model is better and simpler in identifying move structure in scientific abstracts, and is worthy for text classification experiments to capture contextual features of sentences. Originality: The study proposes a Masked Sentence Model based on BERT which takes account of the contextual features of the sentences in abstracts in a new way. And the performance of this classification model is significantly improved by rebuilding the input layer without changing the structure of neural networks.

  • Organization and Exploration Fined-grained Historical Knowledge on Contemporary China Based on Semantic Mining

    分类: 图书馆学、情报学 >> 图书馆学 提交时间: 2017-10-20

    摘要: China has a huge volume of historical resources on its contemporary history. Lots of valuable knowledge are hidden in those resources and cannot be utilized easily. It is an urgent problem to mine the implicit semantic knowledge scattered in a large number of historical resources and to reorganize the historical knowledge and facts in a fined-grained manner, so that can help user to explore the historical knowledge for research and education. This paper proposes a method, which is called “Mining down, Organizing up”, to semantically represent and organize historical knowledge on contemporary China hidden in historical encyclopedia text. Based on the proposed historical ontology of contemporary Chinese, this method extracts knowledge objects and facts from the unstructured historical text items by utilizing text mining technologies, represents the historical knowledge in semantically enriched way, and interlinks the related historical knowledge objects and facts to form a historical knowledge network of the contemporary China. By mining the historical facts and the historical knowledge network, the authors get more valuable patterns from the historical knowledge which could be used to form the new organization scheme to reorganize the historical knowledge in a more vivid way. Based on this method, the authors developed a system which can represent and organize historical knowledge of contemporary China in a fined-grained manner, support user to explore historical knowledge by providing functions such as semantic retrieval, historical objects and facts clustering, visualization navigation, association analysis, and chronicle facts reconstruction etc.